Algorithms for Generalized Topic Modeling
نویسندگان
چکیده
Recently there has been significant activity in developing algorithms with provable guarantees for topic modeling. In this work we consider a broad generalization of the traditional topic modeling framework, where we no longer assume that words are drawn i.i.d. and instead view a topic as a complex distribution over sequences of paragraphs. Since one could not hope to even represent such a distribution in general (even if paragraphs are given using some natural feature representation), we aim instead to directly learn a predictor that given a new document, accurately predicts its topic mixture, without learning the distributions explicitly. We present several natural conditions under which one can do this from unlabeled data only, and give efficient algorithms to do so, also discussing issues such as noise tolerance and sample complexity. More generally, our model can be viewed as a generalization of the multi-view or co-training setting in machine learning.
منابع مشابه
Modeling the Time Windows Vehicle Routing Problem in Cross-Docking Strategy Using Two Meta-Heuristic Algorithms
In cross docking strategy, arrived products are immediately classified, sorted and organized with respect to their destination. Among all the problems related to this strategy, the vehicle routing problem (VRP) is very important and of special attention in modern technology. This paper addresses the particular type of VRP, called VRPCDTW, considering a time limitation for each customer/retai...
متن کاملHeuristic and exact algorithms for Generalized Bin Covering Problem
In this paper, we study the Generalized Bin Covering problem. For this problem an exact algorithm is introduced which can nd optimal solution for small scale instances. To nd a solution near optimal for large scale instances, a heuristic algorithm has been proposed. By computational experiments, the eciency of the heuristic algorithm is assessed.
متن کاملCOVERT Based Algorithms for Solving the Generalized Tardiness Flow Shop Problems
Four heuristic algorithms are developed for solving the generalized version of tardiness flow shop problems. We consider the generalized tardiness flow shop model with minimization of the total tardiness as its performance measure. We modify the concept of cost over time (COVERT) for the generalized version of the flow shop tardiness model and employ this concept for developing four algorithms....
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملA Set of Algorithms for Solving the Generalized Tardiness Flowshop Problems
This paper considers the problem of scheduling n jobs in the generalized tardiness flow shop problem with m machines. Seven algorithms are developed for finding a schedule with minimum total tardiness of jobs in the generalized flow shop problem. Two simple rules, the shortest processing time (SPT), and the earliest due date (EDD) sequencing rules, are modified and employed as the core of seque...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017